Creating a Dual-Purpose Treebank
نویسندگان
چکیده
We describe the background for and building of IcePaHC, a one million word parsed historical corpus of Icelandic which has just been finished. This corpus which is completely free and open contains fragments of 60 texts ranging from the late 12 century to the present. We describe the text selection and text collecting process and discuss the quality of the texts and their conversion to modern Icelandic spelling. We explain why we choose to use a phrase structure Penn style annotation scheme and briefly describe the syntactic annotation process. Furthermore, we advocate the importance of an open source policy as regards language resources.
منابع مشابه
Creating a Tree Adjoining Grammar from a Multilayer Treebank
We propose a method for the extraction of a Tree Adjoining Grammar (TAG) from a dependency treebank which has some representative examples annotated with phrase structures. We show that the resulting TAG along with corresponding dependency structure can be used to convert a dependency treebank to a TAG-based phrase structure treebank.
متن کاملITU Treebank Annotation Tool
In this paper, we present a treebank annotation tool developed for processing Turkish sentences. The tool consists of three different annotation stages; morphological analysis, morphological disambiguation and syntax analysis. Each of these stages are integrated with existing analyzers in order to guide human annotators. Our semiautomatic treebank annotation tool is currently used both for crea...
متن کاملHindi CCGbank: CCG Treebank from the Hindi Dependency Treebank
In this paper, we present an approach for automatically creating a Combinatory Categorial Grammar (CCG) treebank from a dependency treebank for the Subject-Object-Verb language Hindi. Rather than a direct conversion from dependency trees to CCG trees, we propose a two stage approach: a language independent generic algorithm first extracts a CCG lexicon from the dependency treebank. A determinis...
متن کاملDomain Adaptation with Artificial Data for Semantic Parsing of Speech
We adapt a semantic role parser to the domain of goal-directed speech by creating an artificial treebank from an existing text treebank. We use a three-component model that includes distributional models from both target and source domains. We show that we improve the parser’s performance on utterances collected from human-machine dialogues by training on the artificially created data without l...
متن کاملCreating a Dependency Syntactic Treebank: Towards Intuitive Language Modeling
In this paper we present a user-centered approach for defining the dependency syntactic specification for a treebank. We show that by collecting information on syntactic interpretations from the future users of the treebank, we can model so far dependency-syntactically undefined syntactic structures in a way that corresponds to the users’ intuition. By consulting the users at the grammar defini...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JLCL
دوره 26 شماره
صفحات -
تاریخ انتشار 2011